This data set contains 113,937 loans with 81 variables on each loan, including loan amount, borrower rate (or interest rate), current loan status, borrower income, and many others. The dataset contains so-called listings which either have been transformed to a loan or not. Partially funded loans are possible as well. My main overall interest might be why and who is becoming a so-called Prosper borrower and furthermore what is mainly influencing the interest rate. Interesting would be how the average Prosper rate is compared to the normal financial market.
Some Information about This Dataset can be found in my Google Drive link below. (https://drive.google.com/file/d/1SnL9_i9fbUx2M2MYTxwZ5jyLbOeLvQAD/view?usp=sharing). <------
Bid Object: A Bid is created when a Lender wishes to lend money to a Borrower in response to a Listing the Borrower created to solicit Bids. Bids are created by specifying an Amount and a Minimum Rate in which the Lender wishes to receive should the Bid win the auction and become a Loan. The Minimum Rate remains private unless the Bid is Outbid by other Bids offering a lower Minimum Rate.
Category Object: A Category is collection of Groups which share a common interest or affiliation. Categories are created by the Prosper Team. Group Leaders can associate their Group with one or more categories as they relate to their group.
CreditProfile Object: A CreditProfile is a timestamped set of extended credit information for a Member. Row level display and publication of CreditProfile is explicitly forbidden.
Group Object: A Group is a collection of Members who share a common interest or affiliation. Groups are managed by Group Leaders who bring borrowers to Prosper, maintain the group's presence on the site, and collect and/or share Group Rewards. Borrowers who are members of a group often get better interest rates because Lenders tend to have more confidence in Borrowers that belong to trusted Groups.
Listing Object: A Listing is created by a Borrower to solicit bids by describing themselves and the reason they are looking to borrow money. If the Listing receives enough bids by Lenders to reach the Amount Requested, then after the Listing period ends it will become a Loan. A Borrower may only have one active listing at a particular moment in time.
Loan Object: A Loan is created when a Borrower has received enough Bids to meet the full amount of money that the Borrower requested in their Listing. The Borrower must then make payments on the Loan to keep its status current.
Loan Performance Object: A LoanPerformance is an event in a Loan History that causes a change in loan value. This table can be used to calculate roll rates. Row level display and publication of LoanPerformance is explicitly forbidden.
Marketplace Object: The Marketplace is a collection of metrics and statistics about the Prosper Marketplace. These metrics are calculated daily. Historical metrics are provided as well.
Member Object: A Member is a registered user of the Prosper Marketplace site. A Member may have one or multiple roles which determines which actions the Member is allowed to perform on the site.
Based on my high level questions I think these are the main attributes:
I examined the structure of the dataset utilizing the mentioned sources and categorized 3 main areas with the following attributes. The main attributes are referenced as bold below. I assume the other attributes are helping to explain variations and patterns observed in the data. However, they might be not taken into consideration, depending on the analysis.
* **ProsperScore**: A custom risk score built using historical Prosper data. The score ranges from 1-10, with 10 being the best, or lowest risk score. Applicable for loans originated after July 2009.
* **Term**: The length of the loan expressed in months.
* **LoanStatus**: The current status of the loan: Cancelled, Chargedoff, Completed, Current, Defaulted, FinalPaymentInProgress, PastDue. The PastDue status will be accompanied by a delinquency bucket.
* ClosedDate: Closed date is applicable for Cancelled, Completed, Chargedoff and Defaulted loan statuses.
* **LoanOriginalAmount**: The origination amount of the loan.
* **MonthlyLoanPayment**: The scheduled monthly loan payment.
* PercentFunded: Percent the listing was funded.
* InvestmentFromFriendsCount: Number of friends that made an investment in the loan.
* InvestmentFromFriendsAmount: Dollar amount of investments that were made by friends.
* Investors: The number of investors that funded the loan.
* CreditScoreRangeUpper: The upper value representing the range of the borrower's credit score as provided by a consumer credit rating agency.
* CurrentCreditLines: Number of current credit lines at the time the credit profile was pulled.
* OpenCreditLines: Number of open credit lines at the time the credit profile was pulled.
* TotalCreditLinespast7years: Number of credit lines in the past seven years at the time the credit profile was pulled.
* InquiriesLast6Months: Number of inquiries in the past six months at the time the credit profile was pulled.
* CurrentDelinquencies: Number of accounts delinquent at the time the credit profile was pulled.
* AmountDelinquent: Dollars delinquent at the time the credit profile was pulled.
* DelinquenciesLast7Years: Number of delinquencies in the past 7 years at the time the credit profile was pulled.
A custom risk score built using historical Prosper data. The score ranges from 1-10, with 10 being the best, or lowest risk score. Applicable for loans originated after July 2009.
Comment: The other scores counts are somehow centred around "6" which divides lower risks and higher risks. Although the 8 and 4 are as well very dominant.
The employment status of the borrower at the time they posted the listing.
Index(['Not employed', 'Other', 'Part-time', 'Retired', 'Self-employed',
'Employed (full-time)'],
dtype='object')
Comment: Most lenders are "Employed (full-time) only a small proportion is "Self-employed". Even less is the proportion of "Part-time" workers.
The two letters abbreviation of the state of the address of the borrower at the time the Listing was created.
Comments: Ok, California is leading clearly that distribution, might be that all of those "Computer Programmers" are located in Silicon Valley.
The category of the listing that the borrower selected when posting their listing: 0 - Not Available, 1 - Debt Consolidation, 2 - Home Improvement, 3 - Business, 4 - Personal Loan, 5 - Student Use, 6 - Auto, 7- Other, 8 - Baby&Adoption, 9 - Boat, 10 - Cosmetic Procedure, 11 - Engagement Ring, 12 - Green Loans, 13 - Household Expenses, 14 - Large Purchases, 15 - Medical/Dental, 16 - Motorcycle, 17 - RV, 18 - Taxes, 19 - Vacation, 20 - Wedding Loans
Comment: This is not a surprise, actually in the various articles I read about P2P lending, it is often mentioned that this type of loan is ideal to consolidate different loans including credit card debts. Here we can see that more than the half of lall loans are used for that purpose. Also here
merge "Not Available" to Others.
The Borrower's Annual Percentage Rate (APR) for the loan.
Comment: Difficult ... there are a lot of values at the end in the bins of 0,35 - 0,36 (btw. which is more than 30%). However looking to the distribution I would say it can be considered as normal distributed.
The Borrower's interest rate for this loan.
Comment: Very similar to the APR which is no surprise as the APR is including the fees and teg the rate, actually everthing the borrower needs to pay for. Actually a bit better then the APR as on there is only one spike far right in the bin of 0,33. This might be related to the spike of the BorrowerAPR.
Comment: 36 month term is clearly dominant, followed by 60 month. 12 month seems to be not that important.
Comment: Most of the loans are in good shape either "Completed" or "Current", some of the loans are in in a Past Due category indicating that the lender is behind the payment schedule.
Finally there is a considerable amount of loans defaulted (comes after the Past Due) and even more are finally Charged Off (appr. 6%). Which indicates the potential risk.
The debt to income ratio of the borrower at the time the credit profile was pulled. This value is Null if the debt to income ratio is not available. This value is capped at 10.01 (any debt to income ratio larger than 1000% will be returned as 1001%). The lower the better!
Comment: Better the log transformed scale shows a normal looking distribution.
Comment: The Q-Q Plot and the standard test isn't perfect but the log10 scale plot looks definitely more normal than the original scale.
The income range of the borrower at the time the listing was created.
$50,000-74,999 24030 $25,000-49,999 22023 $100,000+ 14013 $75,000-99,999 13644 $1-24,999 3840 Not employed 1 Name: IncomeRange, dtype: int64
Comment: I think it could be better to look after the organic distribution of the income the borrowers have at hand at the time of the
listing creation. let's have a look on StatedMonthlyIncome.
The monthly income the borrower stated at the time the listing was created.
count 77551.00 mean 5940.30 std 4000.82 min 0.25 25% 3533.33 50% 5000.00 75% 7166.67 max 158333.33 Name: StatedMonthlyIncome, dtype: object
Comment: A few outliers are pretty much moving the shape to the right. 943 "Monthly Incomes" are higher than 20000$.
Comment: The Q-Q Plot is not really underlining the normality. However the log transformed plot is really much more following a "bell-shape" then the original scale. Let's check as well closer in the up-coming analysis.
count 77551.000000 mean 295.766419 std 189.328517 min 0.000000 25% 158.490000 50% 256.390000 75% 392.280000 max 2251.510000 Name: MonthlyLoanPayment, dtype: float64
Comment: The log transformation seems to describe the values pretty well.
The origination amount of the loan.
count 77108.000000 mean 9300.966683 std 6398.995358 min 1000.000000 25% 4000.000000 50% 8000.000000 75% 14584.500000 max 35000.000000 Name: LoanOriginalAmount, dtype: float64
Seems to be not normal....
Looks a bit better however still not the ideal bell curve or close to it.
Comment: Ok, looking better but still not normal. So I decide to work with the original scale.
Percent the listing was funded.
count 77108.000000 mean 0.998156 std 0.020428 min 0.700000 25% 1.000000 50% 1.000000 75% 1.000000 max 1.012500 Name: PercentFunded, dtype: float64
Comment: Most of the times the full amount was paid out.
The estimated return assigned to the listing at the time it was created. Estimated return is the difference between the Estimated Effective Yield and the Estimated Loss Rate. Applicable for loans originated after July 2009.
Comment: ok the Estimated return is sometimes 0 or negative. Apart from that it looks pretty normal. What about the Loss?
LossEstimated loss is the estimated principal loss on charge-offs. Applicable for loans originated after July 2009.
Comment: ok the Estimated return is sometimes 0 or negative. Apart from that it looks pretty normal. What about the Loss?
Comment: From right to left skewed. We will look in the further analysis and stay with the original scale.
Cleaning General:
The following attributes have being excluded: ListingKey, CreditGrade, CurrentlyInGroup, GroupKey, DateCreditPulled, FirstRecordedCreditLine, OpenRevolvingAccounts, OpenRevolvingMonthlyPayment, TotalInquiries, PublicRecordsLast10Years, PublicRecordsLast12Months, TotalTrades, ,TradesNeverDelinquent (percentage), TradesOpenedLast6Months, LoanKey, LoanCurrentDaysDelinquent, LoanFirstDefaultedCycleNumber, LoanMonthsSinceOrigination ,LoanNumber, LoanOriginationQuarter, LP_CustomerPayments, LP_CustomerPrincipalPayments, LP_InterestandFees, LP_ServiceFees, LP_CollectionFees, LP_GrossPrincipalLoss, LP_NetPrincipalLoss, LP_NonPrincipalRecoverypayments.
Prosper Rating:
As we have 25% not populated because the Prosper Rating started after July 2009 I thought maybe a simple rule (or even regression) for the derivation of the Prosper
rating based on the external one would be easy. But it is not as e.g. D is between 680 - 699 and E later down as well. So let's flag them as before_July09
and analyze keeping decide at th end to keep or to get rid of them.
BorrowerAPR: There are a lot of values at the end in the bins of 0,35 - 0,36 (btw. which is more than 30%). However looking to the distribution I would say it can be considered as normal distributed.
DebtToIncomeRatio: This isn't looking normal distributed at all. As it's a financial KPI and the original scale is pretty much right skewed, a log scale might better explain the distribution.
StatedMonthlyIncome: The Q-Q Plot is not really underlining the normality. However the log transformed plot is really much more following a "bell-shape" then the original scale.
MonthlyLoan Payment: Here are some values here which are extremely low below 10$ and they spread to different categories like completed etc. However out of those categories the proportion of 0 is very low. The log scale describes the data pretty normal.
EstimatedReturn:The Estimated return is sometimes 0 or negative. Apart from that it looks pretty normal.
EstimatedLoss: The data looks right-skewed, after the log transform it look left-skewed, so I decide to stay with the original scale.
Now move o to the bivariate exploration, first see if there is a any insight in the following attribute comparisons.
Ok, the correlation matrix and the pairplot show some good candidates for continous comparison. I continue based on th equestion I formulated.
Occupation BorrowerState Denominator: 77108
Comment: As expected, the pivot of state and occupation shows that e.g. that the group "Others" and "Professionals" comimng from California (CA) are the most frequent. We have more computer programmers in California then other states.
Let's look if there is an intersting insight for lenders owing a house. Do they own houses?
Are there states with more homeowners in the Prosper Community?
Comment: The Homeowner counts in Californa(CA) are much lesser than eg. Texas. In Florida is nearly even. The smaller the overall Prosper utilization gets the bigger the count of the homeowners get (with a few exceptions).
I'm not an US expert but comparing e.g. California and NY with WA as example is also comparing Metroplitan areas vs. counteyside. Which occupation do Homeowners have?
Are there occupation groups having more houses?
Comment: Group "Others" have less "Homeowners" than e.g. than "Professionals" the 2nd largest group. Executives tend to have more real estate property as well. Overall we know that the split of Homeowners and Non-Homeowners is nearly even. Which Employement Status do the different occupations have?
Which Occupations have which Employment Status?
Occupation EmploymentStatus Denominator: 77108
Comment: Biggest group "Others" has most of them as "Employed". The following groups "Professional", "Comp. Progr.", "Executive" and have no (or neglectable) count of not employed or retired. Even part-time proportion is very low. So bottomline the Prosper Community sees to be Full-time employed.
There might be a relationship between state and listing type?
ListingCategory_alpha BorrowerState Denominator: 77108
Comment: Nothing special here, California and Debt Consolidation collects most of the counts. Debt consolidation seems to be leading:
Do we have occupation categories doing a particular type of loan?
Are there proffessions doing a particular ListingCategory more often?
ListingCategory_alpha Occupation Denominator: 77108
Comment: The Other categories are kind of dominating. Professionals do most Debt Consolidation. Followed by Home Improvement and Business. Not a big inside as we know already that the Consolidtion of loans is the strongest use case. Do homeowners do different things withe the money?
Are Homeowners more in the Home Improvement Busienss?
Comment: Debt Consolidation is highest for Homeowner as well. Surprisingly Home Improvement is done by non-Homeowner as well (appr. 2,5%).
Ok, by now not too much intersting insights let's explore further some quantitative attributes.
How much money do the different occupations have at hand?
count 77108.000000 mean 3.702410 std 0.257981 min -0.602060 25% 3.549192 50% 3.698970 75% 3.855317 max 5.199572 Name: StatedMonthlyIncome_ln, dtype: float64
Comment: The histograms above are ordered by descending by the sum of avaibale incomme in that group.Interestingly the mean income (red line) has a different order. Let's look to that.
Comment: Ok, doctors have the most available income ...
Comment: However Professionals, Computer Programmers and the very large group Others are still above the average income.
Let's see how the overall debt situation is looking like..
Let's neglect the outlier for the momnet and have a look to the values up to the 3rd quantile.
count 77108.000000 mean 0.259043 std 0.318858 min 0.010000 25% 0.150000 50% 0.220000 75% 0.320000 max 10.010000 Name: DebtToIncomeRatio, dtype: float64
Comment: Teachers Aide, Waiters, Students and Fodd Service have the highest ratio. Computer Programmers, Professionals are below the average Ratio, however the large group of Others is slightly above. Whart about the extemes > 0,3
Comment: We have 323 above 1 which means those lenders have earn the same amount of money as they have debts(and evern more). There seems to be a concentration between 1 and 3. I don't consider them as they make only a small propoportion. Let's look to the high ratios.
Comment: Food Service, Sales/Retail, Clerical, Teachers as well as Others, Computer Programmers and Professionals have considerable amount of high ratio. How is the score looking for the occupations?
How much money do they borrow on average?
count 77108.000000 mean 9300.966683 std 6398.995358 min 1000.000000 25% 4000.000000 50% 8000.000000 75% 14584.500000 max 35000.000000 Name: LoanOriginalAmount, dtype: float64
First let's compare the 2 main scoring attributes ProsperScore and ProsperRating,
Comment: The darker area somehow shows the relationsip between the 2. However Prosper Rating e.g. of "C" gives a range of from 2-11.
Occupation ProsperRating_Alpha Denominator: 77108
Comment: The majority of the different professions are concentrated in the area from C to A. What is the Score telling us?
Occupation ProsperScore Denominator: 77108
Comment: It ranges 2-10, so the extreme 1 and 11 are much lesser seen. Espeically the usual suspects Others, Professionals, Executives , Computer Programmer, Teacher etc. are underlining that.
Summary Who is using Prosper:
In general the occupation group "Other" is dominating the listings, whereas other occupations have a small proportion. This is a petty as it might be due to the fact that users have the possibility to choose from a ddlb the other category.
Let's move over to my 2nd question I had and pull some atributes for bi-variate analysis.
I had immidiatly 2 things in mind why the borrowers use Prosper instead using a normal bank. As we know now already that Debt Consolidation is the most frequent ListingType so loan type, what makes it so attractive? Is it the time from application to the point of time receiving the money. I call that time to money.
count 77108.000000 mean 11.102674 std 17.459497 min 0.000000 25% 4.000000 50% 7.000000 75% 12.000000 max 530.000000 Name: Time2Money, dtype: float64
Comment: Fast is different, however the platform Prosper seems to accelarate with increasing Listings. There is a clear downward trend in Time2Money attribute, which might attract the borrowers. What else could attract borrowers to use Prosper? What else could be reason to use Prosper? For sure the money you need to pay which can be summarized by the Borrower Annual % Rate.
Comment: At the end of 2011 the rates have being considerable higher as before and it seems as this was the end of an upward trend, till the mid of 2014 it was constantly going down. Is there a difference on the Occupation?
Student from a technical school having the best mean APR followed by Judges, Doctors, Invstors as an example. However one can see that the porportion of those aren't very high. The usual suspects in that dataset (Others, Professionalsetc.) come later. Let's check the distribution of the means and bring it to an order.
the means of the occupations seems to be nicely normal distributed... let's review them
Comment: It's not a bargain, the avg. Borrower APR is appr. 22%. Let's look if there is a relation to the score. However Computer Programmer and Executives are getting far better rates then Professionals and the famous Others.
Summary: Why is Prosper used?
Finally move now to question ....
There are some "low Brainers" which we can check first.
Comment: The distributions are nicely climbing with decreasing the rating. Most of the distribution are bi- or multimodal. The spread in each category is as well large. On categories AA we find many outliers to the right. In A as well and C, D have outliers as well in both directions. E has outliers to the left. Again below as an example the density and viloin plot for "E"! HR has a relatively small IQR and many outliers to the left an dto the right.
So what else influences the APR?
Comment: As we know from the beginning the correlation coeff = 0,128, so we have a weak linear relation. Below 2 the rates are going the full range we need to zoom in a bit. One thing which is looking strange is the nearly vertical line between 0,35 and appr. 0,37 along all DtIR.
Below 1 and 0.5 DTIR there seems to be a concentration on the of rates...let's check quickly in a heatmap.
Comment: The majority of the values is concentrated between 0,1 and 0,4 and 0,10 to 0,25. We can see a slight upward trend in the concentrated area, but the rate also must be influenced by something else. Maybe the term?
We know from the Univariate Analysis that 12 month term are very seldom bit they have a lower rates as a starting point of their distributions. So 36 seems to be multimodal and 60 right skewed. Still all 3 terms give overall a wide range of rates. In 36 we see the 3rd modality which is looking similar to the concentration we saw in the DTIR between 0,35 and 0,37 let's look closer. The Medians are close to each other, the range for 60 is smaller.
Comment: So this can might explain the peak line we saw in the D2IR I was mentioning before, however either it is a combination of attributes or it is another attribute which is primarily deriving the rate. Let's check the Score.
Comment: We can see that the medians of the rates are increasing by decreasing the score. However 2 things are here interesting.
The groups 11 and 1 are relatively small. So there is still something else which controls the rates.Let's look to the income
Comment: The appr. 10% not having a verified income do get higher rates. So having all documents ready helps as well here. What else could influence the rate. Property?
Comment: To own a house definity helps. We can clearly see that the median rate is lower for houseowners. Ok let's finally check here amount requested.
count 77108.000000 mean 3.702410 std 0.257981 min -0.602060 25% 3.549192 50% 3.698970 75% 3.855317 max 5.199572 Name: StatedMonthlyIncome_ln, dtype: float64
Comment: we can see on the log scale that the most frequent area is 1900 and above 20000 (I mean it's monthly income). However the range of APR is still going from 5 - above 40%. Having a lot of money/month is not an indication to get better rates.
Comment: We see that the fit isn't really good but it emphasizes a bit the less income the higher the rate. In the heatmap we see a that the range goes from 0,1 to 0,4 through all income levels. Let's revisit the categorial income range.
$50,000-74,999 23898 $25,000-49,999 21881 $100,000+ 13943 $75,000-99,999 13576 $1-24,999 3809 Not employed 1 Name: IncomeRange, dtype: int64
Comment: This makes it really visible the platform Prosper is used by well earning lenders, the spike between 0.3 and 0.4. Having a higher income definitely helps to get the best rates, however there is much variance in the distributions. The proportion of below 25k and even unemployed is very small.
count 77108.000000 mean 297.465653 std 188.535826 min 0.240000 25% 159.770000 50% 258.030000 75% 392.810000 max 2251.510000 Name: MonthlyLoanPayment, dtype: float64
Comment: The monthly payment seems to have a limit at appr. 1000$ this is where below most of the rates are concentrated a slight trend can be seen that lower rates are drawn by higher monthly payments. Also notice the peak between 0.3 and 0.4 and 200!
count 77108.000000 mean 9300.966683 std 6398.995358 min 1000.000000 25% 4000.000000 50% 8000.000000 75% 14584.500000 max 35000.000000 Name: LoanOriginalAmount, dtype: float64
Comment: The higher the loan the spread of rates decreases. pls. note the peak between 0.3. and 0.4 till 5000! What else can influence the rate, yes the estimate of loss and return for the lender.
Comment: Very nice correlation...let's see a fitted line
Comment: The smaller the loss the smaller the APR. We can speak here about a strong negative relationship....so what about the estimated Return?
Comment: Surprisingly the Return is as well in negative relationship to the APR. Why is that? It's the way how it is calculated. The estimated yield which is the difference between the estimated yield and the loss.
Summary Who is using Prosper:
In general the occupation group "Other" is dominating the listings, whereas other occupations have a small proportion. This is a petty as it might be due to the fact that users have the possibility to choose from a ddlb the other category.
Summary: Why is Prosper used?
What is influencing the Rate?
BorrowerAPR vs. ProsperRating: The distributions are nicely climbing with decreasing the rating. Most of the distribution are bi- or multimodal. The spread in each category is as well large. On categories AA we find many outliers to the right. In A as well and C, D have outliers as well in both directions. E has outliers to the left. Again below as an example the density and viloin plot for "E"! HR has a relatively small IQR and many outliers to the left.BorrowerAPRvs. DebtToIncomeRatio: As we know from the beginning the correlation coeff = 0,128, so we have a weak linear relation. Below 2 the rates are going the full range we need to zoom in a bit. One thing which is looking strange is the nearly vertical line between 0,35 and appr. 0,37 along all D2IR.The majority of the values is concentrated between 0,1 and 0,4 and 0,10 to 0,25. We can see a slight upward trend in the concentrated area, but the rate also must be influenced by something else.* `BorrowerAPR vs Term`: We know from the Univariate Analysis that 12 month term are very seldom bit they have a lower rates as a starting point of their distributions. So 36 seems to be multimodal and 60 right skewed. Still all 3 terms give overall a wide range of rates. In 36 we see the 3rd modality which is looking similar to the concentration we saw in the DTIR between 0,35 and 0,37 let's look closer. The Medians are close to each other, the range for 60 is smaller. So this can might explain the peak line we saw in the D2IR I was mentioning before, however either it is a combination of attributes or it is another attribute which is primarily deriving the rate.
* `ProsperScore`: We can see that the medians of the rates are increasing by decreasing the score. However 2 things are here interesting.
* The scores 7 - 3 are relatively wide IQRs and no (nearly) outliers.
* Whereas 10 to 8 and 2 to 1 are have many outliers to the right and to the left.
* The groups 11 and 1 are relatively small. So there is still something else which controls the rates.
* `BorrowerRate vs. IncomeVerifiable`: The appr. 10% not having a verified income do get higher rates. So having all documents ready helps as well here.
* `BorrowerAPR vs. IsBorrowerHomeowner`: To own a house definity helps. We can clearly see that the median rate is lower for houseowners.
* `BorrowerAPR vs. StatedMonthyIncome`: We can see on the log scale that the most frequent area is 1900 and above 20000 (I mean it's monthly income). However the range of APR is still going from 5 - above 40%. Having a lot of money/month is not an indication to get better rates. We see that the fit isn't really good but it emphasizes a bit the less income the higher the rate. In the heatmap we see a that the range goes from 0,1 to 0,4 through all income levels.
* `BorrowerAPR vs. IncomeRange`: This makes it really visible the platform Prosper is used by well earning lenders, the spike between 0.3 and 0.4. Having a higher income definitely helps to get the best rates, however there is much variance in the distributions. The proportion of below 25k and even unemployed is very small.
* `BorrowerAPR vs. MonthlyPayment`: The monthly payment seems to have a limit at appr. 1000$ this is where below most of the rates are concentrated a slight trend can be seen that lower rates are drawn by higher monthly payments.
* `BorrowerAPR vs. LoanAmount`: The higher the loan the spread of rates decreases.
* `BorrowerAPR vs. EstimatedLos`: The smaller the loss the smaller the APR. We can speak here about a strong negative relationship....so what about the estimated Return?
* `Estimated Return`: Surprisingly the Return is as well in negative relationship to the APR. Why is that? It's the way how it is calculated. The estimated yield which is the difference between the estimated yield and the loss.
Yes the Income Range was underlining the relationship of high incomes to better rates.
Let's put together some demographical attributes and soem attributes realted to finacials.
Comment: Very interesting to see the demographical aspects together.
11.15765999157341
Comment: So the 90 days moving average indicates that there is a nearly constant around appr. 11days. Is there a change by the level of debts and rating...
Ok, pretty busy chart, let's zoom in
Comment: There is a wide range of days between the listing creation and the payout of the loan. There is nothing spectacular here, maybe we check in combination with the loan amount by occupation...
Comment: The values do not differ a lot by the top 10 occupations. As well as in comparison to the mean Time to Money (appr. 11days) and the mean Loan amount (appr. 9300$) over all occupations. This means the speed of the listing process is not significantly differing. If the Time to Money attribute is a factor for Borrowers to use Prosper can't be finally proofed as we would need to compare it with e.g. processing times of traditional banks.
What about the interest rates, let's check the rates, in particular the BorrowerAPR whcih includes all fees...
Comment: Loan Amount vs. the BorrowerAPR by Occupation is pretty much interesting. The mean APR and Loan is pretty much close to each other, however Computer Programmers get the best rates on average, 19,10%. Administrative Assistants get the worst rates, 25,59. Interesting is the fact that loan amounts e.g. 20k seem to have primarily rates better than the average of the respective occupation group. See above the red arrow annotations in the other group.
However if th rates are so good to attract the borowers would need to be analyzed with other data, e.g. data from bankloans for similar purposes and runtimes. For me the rates seem to be far high especially today (and compared to Germany).
This leads us to the question what influences most the BorrowerAPR?
The majority of borrowers are employed, which is evident because it wouldnt be easy to get a loan without a job. To dig further, i will investigate in the next part what are their occupations.
Comment: Most borrowers on Prosper indicate to be Professional, Computer Programmer, Administrative Assistant, Executive, Teacher, Analyst... All those people have chosen to borrow on Prosper instead of going to the conventional way and borrow from their commerciaL bank. This is could be due to an attractive interest rate offered for this categories of people. That what we will going to see later.
Comment: Its clear now that for people who are delinquent (defaulted,past due, charged-off payments), banks applies more restricted credit conditions (higher interest rates). Also these people have lower credit scores than people with good status.
Comment: As we can see, the two rates have had the same evolution profile. As expected, rates for individuals with collateral(homeowner) are lower than those without collateral, but the gap between the two rates has decreased significantly since 2011.
Comment: This figure confirmes all what have been said already that applied borrower rate is higher for unemployed people and delinquent loans( loans with bad records).
Let's review some rating, income and loan attributes....
Comment:
Comment: We nicely that the Prosper Rating is a mighty evaluation metric. We can clearly see that the distribution of APR is getting worse from HR to AA. Furthermore we can see that the rates are increasing with higher indebtedness (at least slightly) and decreasing Prosper Rating. Same for the estimated loss it gets slightly higher by increasing indebtedness, but much more by the assignment to the respective Prosper Rating.